223 research outputs found

    Data Set Modelability by QSAR

    Get PDF
    We introduce a simple MODelability Index (MODI) that estimates the feasibility of obtaining predictive QSAR models (Correct Classification Rate above 0.7) for a binary dataset of bioactive compounds. MODI is defined as an activity class-weighted ratio of the number of the nearest neighbor pairs of compounds with the same activity class versus the total number of pairs. The MODI values were calculated for more than 100 datasets and the threshold of 0.65 was found to separate non-modelable from the modelable datasets

    Generating folded protein structures with a lattice chain growth algorithm

    Get PDF
    We present a new application of the chain growth algorithm to lattice generation of proteinstructure and thermodynamics. Given the difficulty of ab initioproteinstructure prediction, this approach provides an alternative to current folding algorithms. The chain growth algorithm, unlike Metropolis folding algorithms, generates independent proteinstructures to achieve rapid and efficient exploration of configurational space. It is a modified version of the Rosenbluth algorithm where the chain growth transition probability is a normalized Boltzmann factor; it was previously applied only to simple polymers and protein models with two residue types. The independent protein configurations, generated segment-by-segment on a refined cubic lattice, are based on a single interaction site for each amino acid and a statistical interaction energy derived by Miyazawa and Jernigan. We examine for several proteins the algorithm’s ability to produce nativelike folds and its effectiveness for calculating protein thermodynamics. Thermal transition profiles associated with the internal energy, entropy, and radius of gyration show characteristic folding/unfolding transitions and provide evidence for unfolding via partially unfolded (molten-globule) states. From the configurational ensembles, the proteinstructures with the lowest distance root-mean-square deviations (dRMSD) vary between 2.2 to 3.8 Å, a range comparable to results of an exhaustive enumeration search. Though the ensemble-averaged dRMSD values are about 1.5 to 2 Å larger, the lowest dRMSD structures have similar overall folds to the native proteins. These results demonstrate that the chain growth algorithm is a viable alternative to protein simulations using the whole chain

    Development of Quantitative Structure−Binding Affinity Relationship Models Based on Novel Geometrical Chemical Descriptors of the Protein−Ligand Interfaces

    Get PDF
    Novel geometrical chemical descriptors have been derived based on the computational geometry of protein-ligand interfaces and Pauling atomic electronegativities (EN). Delaunay tessellation has been applied to a diverse set of 517 X-ray characterized protein-ligand complexes yielding a unique collection of interfacial nearest neighbor atomic quadruplets for each complex. Each quadruplet composition was characterized by a single descriptor calculated as the sum of the EN values for the four participating atom types. We termed these simple descriptors generated from atomic EN values and derived with the Delaunay Tessellation the ENTess descriptors and used them in the variable selection k-Nearest Neighbor quantitative structure-binding affinity relationship (QSBR) studies of 264 diverse protein-ligand complexes with known binding constants. 24 complexes with chemically dissimilar ligands were set aside as an independent validation set, and the remaining dataset of 240 complexes was divided into multiple training and test sets. The best models were characterized by the leave-one-out cross-validated correlation coefficient q2 as high as 0.66 for the training set and the correlation coefficient R2 as high as 0.83 for the test set. High predictive power of these models was confirmed independently by applying them to the validation set of 24 complexes yielding R2 as high as 0.85. We conclude that QSBR models built with the ENTess descriptors can be instrumental for predicting the binding affinity of receptor-ligand complexes

    Fishing out the signal in polypharmacological high-throughput screening data using novel navigator cheminformatics software

    Get PDF
    Many drugs are characterized by polypharmacological mechanisms of action. Thus, prospective drug discovery studies often start by testing large compound libraries in multiple and diverse High-Throughput Screening (HTS) assays. These large heterogeneous data collections pose numerous computational challenges concerning processing, curation, and analysis of untreated output files generated by plate readers. We have developed the freely-accessible HTS Navigator software to enable and facilitate the processing and analysis of polypharmacological HTS data. We report on the capabilities of Navigator and present several case studies where we employed cheminformatics approaches embedded within the Navigator to curate and analyze large datasets of compounds tested toward different panels of targets

    Integrative Approaches for Predicting In Vivo Effects of Chemicals from their Structural Descriptors and the Results of Short-Term Biological Assays

    Get PDF
    Cheminformatics approaches such as Quantitative Structure Activity Relationship (QSAR) modeling have been used traditionally for predicting chemical toxicity. In recent years, high throughput biological assays have been increasingly employed to elucidate mechanisms of chemical toxicity and predict toxic effects of chemicals in vivo. The data generated in such assays can be considered as biological descriptors of chemicals that can be combined with molecular descriptors and employed in QSAR modeling to improve the accuracy of toxicity prediction. In this review, we discuss several approaches for integrating chemical and biological data for predicting biological effects of chemicals in vivo and compare their performance across several data sets. We conclude that while no method consistently shows superior performance, the integrative approaches rank consistently among the best yet offer enriched interpretation of models over those built with either chemical or biological data alone. We discuss the outlook for such interdisciplinary methods and offer recommendations to further improve the accuracy and interpretability of computational models that predict chemical toxicity

    Computer-Assisted Decision Support for Student Admissions Based on Their Predicted Academic Performance

    Get PDF
    Objective. To develop predictive computational models forecasting the academic performance of students in the didactic-rich portion of a doctor of pharmacy (PharmD) curriculum as admission-assisting tools

    Chemistry-wide association studies (CWAS) to determine joint toxicity effects of co-occurring chemical features

    Get PDF
    Individual structural alerts often fail to accurately predict chemical toxicity as they tend to overlook the moderating effects of other co-occurring alerts. Features are said to have statistical interaction effects when one changes or modulates the effect of another on the target property. Here we introduce Chemistry-Wide Association Study (CWAS; by analogy with GWAS in genomics) to systematically elicit the individual and interaction effects of chemical features on the target property

    Trust, But Verify: On the Importance of Chemical Structure Curation in Cheminformatics and QSAR Modeling Research

    Get PDF
    Molecular modelers and cheminformaticians typically analyze experimental data generated by other scientists. Consequently, when it comes to data accuracy, cheminformaticians are always at the mercy of data providers who may inadvertently publish (partially) erroneous data. Thus, dataset curation is crucial for any cheminformatics analysis such as similarity searching, clustering, QSAR modeling, virtual screening, etc., especially nowadays when the availability of chemical datasets in public domain has skyrocketed in recent years. Despite the obvious importance of this preliminary step in the computational analysis of any dataset, there appears to be no commonly accepted guidance or set of procedures for chemical data curation. The main objective of this paper is to emphasize the need for a standardized chemical data curation strategy that should be followed at the onset of any molecular modeling investigation. Herein, we discuss several simple but important steps for cleaning chemical records in a database including the removal of a fraction of the data that cannot be appropriately handled by conventional cheminformatics techniques. Such steps include the removal of inorganic and organometallic compounds, counterions, salts and mixtures; structure validation; ring aromatization; normalization of specific chemotypes; curation of tautomeric forms; and the deletion of duplicates. To emphasize the importance of data curation as a mandatory step in data analysis, we discuss several case studies where chemical curation of the original “raw” database enabled the successful modeling study (specifically, QSAR analysis) or resulted in a significant improvement of model's prediction accuracy. We also demonstrate that in some cases rigorously developed QSAR models could be even used to correct erroneous biological data associated with chemical compounds. We believe that good practices for curation of chemical records outlined in this paper will be of value to all scientists working in the fields of molecular modeling, cheminformatics, and QSAR studies

    AFLOW-ML: A RESTful API for machine-learning predictions of materials properties

    Full text link
    Machine learning approaches, enabled by the emergence of comprehensive databases of materials properties, are becoming a fruitful direction for materials analysis. As a result, a plethora of models have been constructed and trained on existing data to predict properties of new systems. These powerful methods allow researchers to target studies only at interesting materials \unicode{x2014} neglecting the non-synthesizable systems and those without the desired properties \unicode{x2014} thus reducing the amount of resources spent on expensive computations and/or time-consuming experimental synthesis. However, using these predictive models is not always straightforward. Often, they require a panoply of technical expertise, creating barriers for general users. AFLOW-ML (AFLOW M\underline{\mathrm{M}}achine L\underline{\mathrm{L}}earning) overcomes the problem by streamlining the use of the machine learning methods developed within the AFLOW consortium. The framework provides an open RESTful API to directly access the continuously updated algorithms, which can be transparently integrated into any workflow to retrieve predictions of electronic, thermal and mechanical properties. These types of interconnected cloud-based applications are envisioned to be capable of further accelerating the adoption of machine learning methods into materials development.Comment: 10 pages, 2 figure

    Prediction of binding affinity and efficacy of thyroid hormone receptor ligands using QSAR and structure-based modeling methods

    Get PDF
    The thyroid hormone receptor (THR) is an important member of the nuclear receptor family that can be activated by endocrine disrupting chemicals (EDC). Quantitative Structure-Activity Relationship (QSAR) models have been developed to facilitate the prioritization of THR-mediated EDC for the experimental validation. The largest database of binding affinities available at the time of the study for ligand binding domain (LBD) of THRβ was assembled to generate both continuous and classification QSAR models with an external accuracy of R2=0.55 and CCR=0.76, respectively. In addition, for the first time a QSAR model was developed to predict binding affinities of antagonists inhibiting the interaction of coactivators with the AF-2 domain of THRβ (R2=0.70). Furthermore, molecular docking studies were performed for a set of THRβ ligands (57 agonists and 15 antagonists of LBD, 210 antagonists of the AF-2 domain, supplemented by putative decoys/non-binders) using several THRβ structures retrieved from the Protein Data Bank. We found that two agonist-bound THRβ conformations could effectively discriminate their corresponding ligands from presumed non-binders. Moreover, one of the agonist conformations could discriminate agonists from antagonists. Finally, we have conducted virtual screening of a chemical library compiled by the EPA as part of the Tox21 program to identify potential THRβ-mediated EDCs using both QSAR models and docking. We concluded that the library is unlikely to have any EDC that would bind to the THRβ. Models developed in this study can be employed either to identify environmental chemicals interacting with the THR or, conversely, to eliminate the THR-mediated mechanism of action for chemicals of concern
    corecore